Replication instructions for Race, Representation, and Responsiveness by G. Agustin Markarian, Mackenzie Lockhart, Jacob Hacker, and Zoltan Hajnal.


* Datasets for analysis are saved in folder: "Data" (codebooks for these files in "Codebook.txt"). The primary files of interest are:
	- policy_outcomes_withissueareas_AM25.csv: Policy responsiveness data, analysis through code: policy_outcomes FINAL.rmd
	- house_vote_level_data_census.csv: House dyadic representation data, analysis through code: Dyadic Representation FINAL.rmd
	- senate_vote_level_data_census.csv: Senate dyadic representation data, analysis through code: Dyadic Representation FINAL.rmd

** Primary analysis code files:
	- policy_outcomes FINAL.rmd
		+ Produces main Figures 1 & 3-7 
		+ Produces additional tables and figures in SI
	- Dyadic Representation FINAL.rmd
		+ Produces main Figures 8-11 
		+ Produces additional tables and figures in SI
	-Figure 2.R
		+ Produces Figure 2
		+ Note: this code call on "Data/racial_opinion_gaps_edited.csv" a hand-edited version of file "Data/racial_opinion_gaps.csv" created through "Data Process Code/08_support_byrace.R". Edits add categories for CCES roll call votes. 

Notes:

1) It is best to process all code within the R Project: "Policy Representation CCES.Rproj" using:	
	1A) R version 4.3.1 (2023-06-16 ucrt)
	1B) R Studio RStudio 2023.06.1+524 "Mountain Hydrangea" Release (547dcf861cac0253a8abb52c135e44e02ba407a1, 2023-07-07) for 	windows Mozilla/5.0 (Windows NT 10.0; Win64; x64) 

2) The primary datasets for analysis were created using code in the "Data Process Code" folder. Code should be ran in order of prefixes starting with "00_codebook_and_data_merger_ReligDesc.R" through "08_support_byrace.R" to reproduce data from raw inputs. 

3) Raw C(C)ES files in folder: "CCES Data"

4) SI-C Code 1 and SI-C Code 2 leverage data from Hill and Huber (2019) for robustness tests in the SI.

5) To ensure maximum transparency, we include nearly all files and code used to construct the datasets from raw inputs. Some files were created entirely by hand or include hand-coded additions to their raw forms (for example, the file "Intermediate Data/policy_outcomes_and_note.xls," which tracks policy outcomes and bridges roll-call votes). In a few exceptional cases, we provide modestly edited files derived either from supplementary tests or from variables in the CES.
	5A) Additional raw, supplemental, and hand-coded data files are available in the folder "Intermediate Data."
	5B) This folder also includes mid-process data produced by code in the "Data Process Code" folder prior to the final data outputs.
	5C) A non-exhaustive summary of additional files is below:
	    i. U.S. Census data for states and districts, gathered through IPUMS-NHGIS, is included in "Intermediate Data/Census Data." Each subfolder contains multiple .csv and .txt files in their original formats. The .csv files provide the census data in panel format. The .txt files are codebooks and correspond to the .csv files with the same name (minus "_codebook"). The data is cleaned and compiled by the code “CCES Representation Census Cleanup Code.r,” creating the two additional files in this folder “DistrictCensusData_CCES_070323.csv” and “StateCensusData_CCES_070323.csv” which contain merged and cleaned district and state census data.
	      - Folders with prefixes "nhgis0068" and "nhgis0078" contain congressional-district data.
	      - Folders with prefixes "nhgis0069" and "nhgis0070" contain state-level data.
	    ii. Congressional agenda data, including raw and hand-coded material used to compare CES policy questions to the congressional agenda and other accounts of important bills, is in "Intermediate Data/Congressional Agenda." Files here are either raw datasets in .csv and .xlsx files from different sources including all bills, all public laws, the Congressional Quarterly Almanac (Congressional Quarterly, Congressional Quarterly Almanac), Mayhew’s “important enactments” (Mayhew 2022), and Curry and Lee’s “congressional majority party agenda” bills (Curry and Lee 2022) or were created by research assistants under the supervision of Jacob Hacker in an iterative process. These datasets were used to compare the CES questions against other accounts of important legislation and the great congressional agenda to better understand how representative and generalizable our dataset is. Additionally, through these sources, we code the topic area of the CES questions we analyze. The data here, particularly those found in the subfolder "Final datasets and code," are used to construct Figure 2 in the main text using the code Figure 2.r, along with Figure SI-E, SI-F1, and SI-F2.   
	    iii. MC roll-call vote data downloaded from Voteview.com is located in "Intermediate Data/Congressional Roll Call Votes." As additional CES years were incorporated and policy bridges (linking CES questions to specific laws/bills) improved, we acquired new roll-call vote data from Voteview.com. This resulted in four separate files, each containing a different set of roll-call votes, which we then merged using "Data Process Code/01_merge_RCV.r".
	    iv. Hill and Huber (2019) survey data, used to construct the dataset for SI-C, is located in "Intermediate Data/HillHuberSurveyMeasure_ReplicationArchive." This is the raw data used in their paper "On the Meaning of Survey Reports of Roll-Call 'Votes'." The data can also be obtained directly from the source at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SSEN5A
 	   v. Other supplemental files in "Intermediate Data" are more idiosyncratic. Examples include:
	        - "aip_states_ideology_v2022a.dat": State ideology data from Warshaw and Tausanovitch (2022), used in SI-, in raw format. This dataset contains MRP estimates of state ideology. More information on how these data are constructed can be found at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BQKU4M
                - "Racial Resentment Senate" / "Racial Resentment House Post2012" / "Racial Resentment Pre2012": These are average racial-resentment scores by state or district used in Figure 11, calculated using the CES. We average responses to the two most common racial-resentment questions and pool them across years to calculate average state- and district-level racial resentment for White respondents. We split House results into pre- and post-2012 years because decennial redistricting affected nearly all states. We do not adjust for post-decennial redistricting that affected only a few states due to court orders.
		- "legislators-current.csv" / "legislators-historical.csv": Data on legislators serving during the study period, with current legislators (as of 2022) in one file and historical legislators in the other, acquired via voteview.com. This enables us to build a panel of legislators and link respondent opinions to the correct legislator based on the year the associated vote occurred, rather than the year the CES question was asked.
		- "dontknow_or_skip_min_by_year.csv": This file summarizes the CES roll-call vote opinion questions that had the lowest combined "don't know" or "skipped" response rates each year. It is used in robustness tests in SI-C. Because CES practices regarding "don't know"/skips (with or without soft or hard requirements) changed over time, we calculate within-year cutoffs above/below the median as well as pooled cutoffs. The roll-call vote questions listed there can be linked to the main datasets using "year_vote." The variable "dk" is the share of respondents per question answering "don't know" or skipping the question.
		- "senate_party_splits.csv": Similar to "dontknow_or_skip_min_by_year.csv," this file contains a list of roll-call votes used in robustness tests in SI-C. The variable "perc_yes" calculates the percent of legislators voting "yes" on each roll-call vote, by party (Democrats: dem_leg == 1; Republicans: dem_leg == 0). These votes can be linked to the main datasets using "year_vote."
		- All other files are likely intermediate outputs created during the data-generation process by code in the "Data Process Code" folder.
	5D) Questions regarding the "Intermediate Data" supplemental files—and how they were created or where they were acquired—can be directed to: gmarkarian@luc.edu
	 



